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ABSTRACT 

We  describe  new  techniques  for  model-based  recognition  of  3-D  objects 
from  unknown  viewpoints  using  single  gray  scale  images.  The  objects  in  the 
scene  may  be  overlapping  and  partially  occluded.  An  efficient  matching  algo- 
rithm, which  assumes  affine  approximation  to  the  perspective  viewing 
transformation,  is  proposed.  The  algorithm  has  an  off-line  model  preprocess- 
ing phase  and  a  recognition  phase  to  reduce  matching  complexity.  The  algo- 
rithm was  successfully  tested  in  recognition  of  flat  industrial  objects  appearing 
in  composite  occluded  scenes. 


1.   Introduction. 

Recognition  of  industrial  parts  and  their  location  in  a  factory  environment  is  a 
major  task  in  robot  vision.  Most  industrial  part  recognition  systems  are  model- 
based  systems  (see  a  survey  in  [C-D]).  The  model  based  approach  is  well  suited  for 
an  industrial  environment,  since  the  objects  processed  by  the  robot  are  usually 
known  in  advance,  and  belong  to  a  certain  subset  of  the  factory's  tools  and  pro- 
ducts. 

We  discuss  the  object  recognition  problem,  where  the  robot  is  faced  with  a 
composite  scene  of  overlapping  parts  (thus  partially  occluding  each  other),  taken 
from  a  data-base  of  known  objects  (e.g.  the  factory's  warehouse).  The  task  is  to 
recognize  the  objects  in  the  scene  and  their  location. 

No  restriction  on  the  viewing  angle  of  the  camera  is  assumed.  In  this  paper  we 
discuss  the  recognition  of  flat  objects  arbitrarily  positioned  in  space.  However,  our 
method  can  be  extended  to  general  objects  and  part  of  our  future  experiments  will 
concentrate  in  this  direction.  The  recognition  is  done  from  2-D  intensity  images. 
The  algorithms  which  we  describe  were  actually  tested  in  a  "real  life  situation"  by 
recognition  of  objects  comprising  composite  scenes  of  industrial  tools,  such  as 
pliers,  wrenches,  etc.  (see  Fig.  1-5  ). 
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Since  we  are  concerned  with  recognition  of  partially  occluded  objects,  no  use  of 
global  features  can  be  made.  We  describe  our  objects  by  a  set  of  local  features. 
These  features  can  be  points,  line  segments,  curve  segments,  etc.  In  this  report  we 
restrict  ourselves  to  the  use  of  points,  which  we  denote  as  'interest  points'.  The 
point  sets  of  the  various  model  objects  are  matched  against  the  point  set  of  the  com- 
posite overlapping  scene  using  a  small  number  of  corresponding  points.  Once  a 
basic  correspondence  was  established,  we  find  the  best  transformation  in  least- 
squares  sense  to  establish  the  correct  position  of  the  model  object  in  the  scene 
image.  A  key  factor  in  our  scheme  is  its  division  into  a  preprocessing  stage  and  a 
recognition  stage.  Our  model  point  sets  are  preprocessed  off-line  independently  of 
the  scene  information,  thus  enabling  an  efficient  on-line  recognition  stage.  A  major 
advantage  of  the  proposed  matching  algorithm  is  its  straightforward  parallelism  both 
in  the  preprocessing  and  recognition  stages. 

Much  work  was  done  on  recognition  of  overlapping  objects  in  2-D  scenes.  [A- 
F],  [B-C],  [G-L],  [K-S-S-S],  [T-M-V]  is  just  a  partial  list  of  proposed  recognition 
algorithms.  Additional  examples  can  be  found  in  [C-D].  [B-J]  and  [C-D]  give  also  a 
comprehensive  survey  of  existing  3-D  object  recognition  systems.  Recent  results 
which  are  not  mentioned  in  the  above  mentioned  surveys  and  are  relevant  to  this 
paper  are  [L],  [T-M],  and  [H-U].  The  SCERPO  system  described  in  [L]  recognizes 
3-D  objects  from  2-D  intensity  scenes.  It  is  viewpoint  invariant,  and  is  based  on 
perceptual  grouping  of  object  features.  At  its  present  stage,  however,  the  SCERPO 
system  is  mainly  suited  for  polyhedral  scenes.  In  [T-M]  a  clustering  approach  is 
used  to  discover  the  transformation  between  the  model  and  the  scene  images.  Our 
task  is  similar  in  its  basic  assumptions  to  that  described  in  [H-U].  We  are  trying  to 
solve  the  same  problem  under  the  same  assumptions.  However,  our  approach  and 
the  approach  in  [H-U]  are  somewhat  complementary.  In  [H-U]  there  is  an  emphasis 
on  the  classification  of  the  model  and  image  features  to  reduce  the  complexity  of 
matching,  while  the  matching  algorithm  itself  is  straightforward.  We,  on  the  other 
hand,  consider  the  case  were  no  such  effective  classification  can  be  done  (this  is  also 
the  assumption  in  [T-M])  and,  hence,  our  emphasis  is  on  the  development  of  an  effi- 
cient feature  matching  algorithm,  which  processes  the  models  and  the  scene  images 
independently  allowing  fast  recognition.  In  case  feature  classification  is  possible  it 
can  be  incorporated  in  our  algorithm  in  a  natural  way  to  improve  its  efficiency. 

This  paper  reports  preliminary  results  in  a  series  of  experiments  to  develop  a 
3-D  object  recognition  system  from  2-D  images,  based  on  efficient  point,  line,  and 
curve  matching  procedures. 


2.  Definition  of  the  Problem 

We  consider  the  problem  of  object  recognition  in  a  cluttered  3-D  scene.  The 
models  of  the  objects  to  be  recognized  are  assumed  to  be  known  in  advance.  The 
objects  in  the  scene  may  overlap  and  also  be  partially  occluded  by  other  (unknown) 
objects  (see  Fig. 4a).  We  allow  the  image  to  be  obtained  from  an  arbitrary 
viewpoint.  At  this  stage  we  will  assume  that  we  are  dealing  with  flat  objects.  These 
initial  assumptions  are  similar  to  those  in  [H-U].  We  also  assume  that  the  depth  of 
the  centroids  of  the  objects  in  the  scene  is  large  compared  to  the  focal  length  of  the 
camera,  and  that  the  depth  variation  of  the  objects  are  small  compared  to  the  depth 
of  their  centroids.  Under  these  assumptions  it  is  well  known  that  the  perspective 
projection  is  well  approximated  by  a  parallel  (orthographic)  projection  with  a  scale 
factor  (see  for  example  p. 79  in  [K],  [H],  or  [T-M]).  Hence,  two  different  images  of 
the  same  flat  object  are  in  an  affine  2-D  correspondence,  namely  there  is  a  non 
singular  2x2  matrix  A  and  a  2-D  (translation)  vector  b  ,  such  that  each  point  x  in 
the  first  image  is  translated  to  the  corresponding  point  Ax  +  b  in  the  second  image. 

Our  problem  is  to  recognize  the  objects  in  the  scene,  and  for  each  recognized 
object  to  find  the  affine  transformation  that  gives  the  best  least-squares  fit  (see  Sec- 
tion 5  for  details)  between  the  model  of  the  object  and  its  transformed  image  in  the 
scene. 

3.  Choice  of  'Interest  Points' 

Our  matching  algorithm,  which  is  described  in  the  next  section,  extracts,  so 
called,  'interest  points',  both  in  the  object  model  images  and  in  the  scene  image  to 
find  the  best  match  between  those  point  sets.  We  do  not  try,  at  this  stage,  to  optim- 
ize the  point  extraction  methods.  These  should  be  data  base  dependent,  so  that  dif- 
ferent data  bases  of  models  will  suggest  different  natural  'interest  points'.  For 
example,  a  data  base  of  polyhedral  objects  naturally  suggests  the  use  of  polyhedra 
vertices  as  'interest  points',  while  'curved'  objects  suggest  the  use  of  sharp  convexi- 
ties deep  concavities  and,  maybe,  zero  curvature  points.  'Interest  points'  do  not 
have  to  appear  physically  in  the  image.  For  example,  a  point  may  be  taken  as  the 
intersection  of  two  non-parallel  line  segments,  which  are  not  necessarily  touching. 
An  'interest  point'  does  not  necessarily  have  to  correspond  to  a  geometrical  feature. 
An  'interest  operator'  based  on  high  variance  in  intensity  is  described  in  [M]  and 
was  used  in  [B-T]. 

Our  basic  assumption  is  that  enough  'interest  points'  can  be  extracted  in  the 
relevant  images.   We  assume  no  special  classification  of  these  points. 
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In  our  experiments  we  used  points  of  sharp  convexities  and  deep  concavities 
(see  Fig.  Ic,  Id,  2b,  3b,  4b,  5b). 

4.   Recognition  of  a  Single  Model  in  a  Scene 

For  the  sake  of  clarity  we  describe  our  algorithm  in  the  simpler  situation, 
where  the  data  base  consists  only  of  one  model.  However,  the  presentation  given 
here  applies  to  the  general  case  where  a  number  of  models  may  appear  in  the  scene. 

It  is  well  known  that  an  affine  transformation  of  the  plane  is  uniquely  defined 
by  the  transformation  of  three  non-collinear  points  (see,  for  example,  [Y-A]). 
Moreover,  there  is  a  unique  affine  transformation,  which  maps  any  non-collinear 
triplet  in  the  plane  to  another  non-collinear  triplet.  Hence,  we  may  extract 
"interesting  points"  on  the  model  and  the  scene,  and  try  to  match  non-collinear  tri- 
plets of  such  points  to  obtain  candidate  affine  transformations.  Each  such  transfor- 
mation can  be  checked  by  matching  the  transformed  model  against  the  scene.  This 
is  also  the  basic  approach  in  [H-U]. 

However,  the  complexity  of  such  a  scheme  is  quite  unfavorable.  Given  m 
points  in  the  model  and  n  points  in  the  scene,  the  worst  case  complexity  is 
(mx«)^  xt,  where  t  is  the  complexity  of  matching  the  model  against  the  scene.  If 
we  assume  that  m  and  n  are  of  the  same  magnitude,  and  t  is  at  least  of  magnitude  m, 
the  worst  case  complexity  is  of  order  n'' .  One  way  to  reduce  this  complexity  ([H- 
U])  is  to  classify  the  points  in  a  distinctive  way,  so  that  each  triplet  can  match  only  a 
small  number  of  other  triplets.  We  consider,  however,  the  situation  were  such  a 
distinction  does  not  exist  or  cannot  be  made  in  a  reliable  way  (see  [T-M]). 
Hence,  we  present  a  more  efficient  triplet  matching  algorithm.  Also,  in  Section  7, 
we  show  how,  in  special  cases,  the  complexity  can  be  reduced  by  using  certain 
affine  metric  invariants. 

The  algorithm  consists  of  two  major  steps.  The  first  one  is  a  preprocessing 
step  which  is  applied  to  the  model  points.  This  step  does  not  use  any  information 
about  the  scene  and  is  executed  off-Hne  before  actual  matching  is  attempted.  The 
second  step,  matching  proper,  uses  the  data  prepared  by  the  first  step  to  match  the 
models  against  the  scene.  The  execution  time  of  this  second  step  is  the  actual 
recognition  time. 

In  order  to  separate  the  computation  into  such  two  independent  steps  we  have 
to  represent  the  model  and  scene  point  information  in  a  way  which  is  both  indepen- 
dent, and  still  allows  comparison  of  corresponding  triplets.    The  fact  that  an  affine 


transformation  is  uniquely  defined  by  the  transformation  of  three  non-collinear 
points  can  also  be  expressed  as  follows.  Consider  a  set  of  m  points  and  pick  any 
ordered  subset  of  three  non-collinear  points.  The  two  linearly  independent  vectors 
based  on  these  points  are  a  2-D  linear  basis.  One  can  express  the  coordinates  of  all 
the  model  points  in  this  basis.  (The  basis  points  will  have  the  coordinates 
(0,0),  (1,0),  (0,1)  respectively.)  Any  affine  transformation  apphed  to  the  set 
points  will  not  change  the  set  of  coordinates  based  on  the  same  ordered  basis  triplet. 
Specifically,  let  eoo.  eio,  eoi  be  an  ordered  affine  basis  triplet  in  the  plane.  Then 
the  affine  coordinates  (^,7])  of  a  point  v  are 

V  =  ?(eio  -  eoo)  +  11(601  -  eoo)  +  eoo 

Application  of  an  affine  transformation  T  will  transform  the  point  v  to 

Tv  =  4(Teio  -  Teoo)  +  Ti(Teoi  -  Teoo)  +  Teoo 

Hence   Tv  has  the  same  coordinates  (^,ti)  in  the  basis  triplet   Teoo»  Teio,  Teoi  . 

Our  algorithm  will  efficiently  compare  these  sets  of  coordinates  belonging  to 
different  bases. 

A)  Preprocessing 

Assume  we  are  given  an  image  of  a  model,  where  m  'interest  points'  have  been 
extracted.  For  each  ordered  non-collinear  triplet  of  model  points  the  coordi- 
nates of  all  other  m—3  model  points  are  computed  talcing  this  triplet  as  an 
affine  basis  of  the  2-D  plane.  Each  such  coordinate  (after  a  proper  quantiza- 
tion) is  used  as  an  entry  to  a  hash-table,  where  we  record  the  number  of  the 
basis-triplet  at  which  the  coordinate  was  obtained  and  the  number  of  the  model 
(in  case  of  more  than  one  model).  The  complexity  of  this  preprocessing  step  is 
of  order  m''  per  model.  New  models  added  to  the  data-base  can  be  processed 
independently  without  recomputing  the  hash-table  (except  when  we  must 
change  its  size  by  re-hashing). 

B)  Recognition 

In  the  recognition  stage  we  are  given  an  image  of  a  scene,  where  n  'interest 
points'  have  been  extracted.  We  choose  an  arbitrary  ordered  triplet  in  the 
scene  and  compute  the  coordinates  of  the  scene  points  taking  this  triplet  as  an 
affine  basis.  For  each  such  coordinate  we  check  the  appropriate  entry  in  the 
hash-table,  and  for  every  pair  (model  number,  basis-triplet  number),  which 
appears  there,  we  tally  a  vote  for  the  model  and  the  basis-triplet  as 
corresponding  to  the  triplet  which  was  chosen  in  the  scene.    (If  there  is  only 
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one  model,  we  have  to  vote  for  the  basis  triplet  alone). 

If  a  certain  pair  (model,  basis-triplet)  scores  a  large  number  of  votes,  we  decide 
that  this  triplet  corresponds  to  the  one  chosen  in  the  scene.  The  uniquely 
defined  affine  transformation  between  these  triplets  is  assumed  to  be  the 
transformation  between  the  model  and  the  scene.  If  the  current  triplet  does  not 
score  high  enough,  we  pass  to  another  basis-triplet  in  the  scene. 
For  the  algorithm  to  be  successful  it  is  enough,  theoretically,  to  pick  three 
non-collinear  points  in  the  scene,  belonging  to  one  model.  The  voting  process, 
per  triplet,  is  linear  in  the  number  of  points  in  the  scene.  Hence,  the  overall 
recognition  time  is  dependent  on  the  number  of  model  points  in  the  scene,  and 
the  number  of  additional  'interest  points'  which  belong  to  the  scene,  but  did  not 
appear  on  any  of  the  models.  Although,  in  the  worst  case,  we  might  have  an 
order  of  n'*  operations,  in  most  cases,  especially  when  the  number  of  models  is 
small,  the  algorithm  will  be  much  faster.  For  example,  if  there  are  k  model 
points  in  a  scene  of  n  points,  then  the  probability  of  not  choosing  a  model  tri- 
plet in  t  trials  is  approximately 

P=(l-(-)')' 
n 

k 
Hence,  for  a  given  e^l,  if  we  assume  a  lower  bound  on  the  'density'   d  =  —  of 

model  points  in  a  scene,  then  the  number  of  trials  t  giving  p<€  is  of  order 

° — T-,  which  is  a  constant  independent  of  n.    Since  the  verification  pro- 

log(l-J-*) 

cess  is  linear  in  n,  we  have,  in  this  case,  an  algorithm  of  complexity  Oin), 
which  will  succeed  with  probabihty  of  at  least  1  — e. 

This  method  assumes  no  a-priori  classification  of  the  model  and  scene  points  to 
achieve  matching  candidates.  If  such  information  is  available,  it  can  be  incorporated 
into  our  method  by  assigning  weights  to  the  correspondence  of  different  triplets  to 
the  model,  and  by  checking  the  triplets  in  an  appropriate  order. 

Numerical  errors  in  the  point  coordinates  are  more  severe  when  the  basis 
points  are  close  to  each  other  compared  to  the  other  model  points  in  the  scene.  To 
overcome  this  problem,  we  may  introduce  the  following  procedure.  If  a  certain 
basis  triplet  gets  a  number  of  votes,  which,  on  one  hand,  are  not  enough  to  accept  it 
as  a  'candidate'  basis,  but,  on  the  other  hand,  do  not  justify  total  rejection,  we  may 
change  this  triplet  by  another  triplet  consisting  of  points,  which  were  among  the 
'voting'  coordinates,  and  are  more  distant  from  each  other  than  the  previous  basis 


points.  In  the  correct  case  this  procedure  will  result  in  a  growing  match,  as  the 
numerical  errors  become  less  significant.  Even  if  a  basis-triplet  belonging  to  some 
model  did  not  get  enough  votes  due  to  noisy  data,  we  still  have  chance  to  recover 
this  model  from  another  basis-triplet. 

A  major  potential  advantage  of  the  suggested  algorithm  is  its  high  inherent 
parallelism.  Parallel  implementation  of  this  algorithm  is  straightforward;  moreover, 
it  should  be  quite  easy  to  build  a  special  device  for  this,  implementing  it  at  very  high 
speed. 

5.   Finding  the  Best  Least-Squares  Match 

As  we  have  mentioned  before,  an  affine  transformation  in  the  plane  is  uniquely 
defined  by  the  transformation  of  three  non-collinear  points.  However,  in  practical 
applications  this  transformation  may  be  somewhat  distorted,  because  of  noisy  com- 
putation of  these  three  points.  Knowledge  of  additional  points,  which  were 
transformed  to  each  other  may  help  us  to  improve  the  accuracy  of  the  computed 
transformation.  In  this  section,  we  show  how  to  compute  an  affine  transformation 
giving  the  best  least-squares  match  between  sets  of  points  which  are  transformed  to 
each  other.  In  [S-S]  this  problem  was  solved  for  Euclidean  transformations.  Our 
presentation  uses  the  same  method  and  generalizes  it  to  affine  transformations. 

Specifically,  assume  that  we  are  looking  for  an  affine  match  between  the 
sequences  of  planar  points  (Uj)"=i  and  (v^)"=i.  We  would  like  to  find  the  affine 
transformation  Tu=Au  +  b  of  the  plane  which  will  minimize  the  l^  distance 
between  the  sequences  (Tu;)j  =  i  and  (v^)"=i: 

8  =  min    i  |Tu,-  -  vj^  . 

T       j  =  l 

To  simplify  the  calculation,  first  translate  the  set  (u^)  so  that 


Then 


mm 
A,  b 


2  u,  =  0  . 

n 

8  =  min     2)     Au,-  +  b  —  v  ,•  ^  = 

A,  b     ^.^1 

i     b-v;|2    +   2    |AU;|2  4-  2  2  b-Au,-  - 
j=i                       j=i                     j=i 

2  2  Au,-v; 

But 
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2  b-Au^  =  bA(  2  u,)  =  0  . 

Hence  b  and  A  appear  independently  in  8  and  we  can  minimize  their  contributions 

separately. 

To  minimize  over  b  we  simply  put 

1     " 

n  j  =  i 

AstoA=(ay),    (/  =  1,2;;  =  1,2)     denote 

n  n 

gi\)  =  gian,ai2,a2i,a22)=    2    lAu^  P  -  2^  Au;V; 

j=l  7=1 


We  have  to  find 


min  g(A)  =  min   g {an, a  12,^21, a 22) 

A 


To    find   this    minima    one    has    to    solve    the    following    system    of   4    equations 
(/  =  1,2;;  =  1,2)  : 


i£-  = 


doij 


0  (*) 


Since  ^  is  a  quadratic  function  in  each  of  its  unknowns,  (*)  is  a  system  of  four  linear 
equations  with  four  unknowns.  (Actually,  these  happen  to  be  two  independent  sets 
of  two  linear  equations  with  two  unknowns.)  Omiting  the  tedious  details  we  present 
the  final  solution  of  this  system. 

For  /  =  1,2   define  the  following  four  /j-dimensional  vectors: 

u'  =  (u}r 


Ti    =    /'„'. 


Then  the  solution  of  (*)  is  given  by 

Tl.■^7l^/TT2.TT2^   _   /tt2  .■.7l^/'TTl  .tt2 


_  (U'-V^)(U^-U^)  -  (U^-V^)(U^-U^) 
A 

l.TTlN/fT2.X/K    _    /Tt1.\7U^Tt1.It2> 


an-  ^ 


_  (U'-U')(U^-V')  -  (U^-V^)(U^-U^) 
A 

_  (U^-V^)(U^-U^)  -  (U^-U^)(U^-V^) 
A 

_  (U^-U^)(U^-V^)  -  (U^-V^)(U^-U^) 


«22  ^ 


-9- 


where 

A  =  (U^-U^)(U2-U2)  -  (U^-U2)(Ui-U2) 

As  we  can  see  A  is  dependent  only  on  one  set  of  points  (in  this  case  the  model 
points),  so  we  can  know  in  advance,  which  sets  of  model  points  will  give  a  solution 
for  the  minima. 

In  Fig. 3c  we  see  an  example  of  a  fit  obtained  by  calculating  the  affine  transfor- 
mation from  three  basis  points,  and  in  Fig. 3d  the  same  model  is  fitted  using  the  best 
least-squares  affine  match,  based  on  10  points,  all  of  which,  by  the  way,  were 
recovered  as  corresponding  points  by  the  transformation  in  Fig. 3c. 

6.   Summary  of  the  Algorithm 

Our  algorithm  can  be  summarized  as  follows: 

A      Represent  the  model  objects  by  sets  of  'interest  points'. 

B  For  each  non-collinear  triplet  of  model  points  compute  the  coordinates  of  all 
the  other  model  points  according  to  this  basis  triplet  and  hash  these  coordinates 
into  a  table  which  stores  all  the  pairs  (model  number,  basis  triplet  number)  for 
every  coordinate. 

C  Given  an  image  of  a  scene  extract  its  interest  points,  choose  a  triplet  of  non- 
collinear  points  as  a  basis  triplet  and  compute  the  coordinates  of  the  other 
points  in  this  basis.  For  each  such  coordinate  vote  for  the  pairs  (model  number, 
basis  triplet  number),  and  find  the  pairs  which  obtained  the  most  coincidence 
votes.  If  a  certain  pair  scored  a  large  number  of  votes,  decide  that  its  model 
and  basis  triplet  correspond  to  the  one  chosen  in  the  scene.  If  not,  continue  by 
checking  another  basis  triplet. 

D  For  each  candidate  model  and  basis  triplet  from  the  previous  step,  establish  a 
correspondence  between  the  model  points  and  the  appropriate  scene  points, 
and  find  the  affine  transformation  giving  the  best  least-squares  match  for  these 
corresponding  sets.  If  the  least-squares  difference  is  too  big  go  back  to  Step  C 
for  another  candidate  triplet.  Finally,  the  transformed  model  is  compared  with 
the  scene  (this  time  we  are  considering  not  only  previously  extracted  'interest 
points').  If  this  comparison  gives  a  bad  result  go  back  again  to  Step  C.  (  In 
our  experiments  we  compared  the  boundaries  of  our  objects  at  equally  spaced 
sample  points.) 


10 


This  is  a  short  summary  of  the  basic  algorithm.  Of  course,  various  improve- 
ments can  be  incorporated  in  its  different  steps,  e.g.,  the  complexity  reduction 
described  in  Section  7. 

7.   Reduction  of  Complexity  using  Afflne  Invariants 

When  the  number  of  'interest  points'  on  the  models  is  large,  various  affine 
invariants  can  be  exploited  to  reduce  the  complexity  of  the  method  presented  in  Sec- 
tion 4.   We  give  one  such  example.    We  will  use  the  following 

Lemma  (see,  for  example,  p. 73  in  [K])  :  Two  straight  lines  which  correspond  in 
an  affine  transformation  are  'similar',  i.e.  corresponding  segments  on  the  two 
lines  have  the  same  length  ratio. 

Moreover,  the  same  statement  holds  for  sets  of  parallel  lines.  Hence,  if  we  have  a 
set  of  points,  which  are  located  on  parallel  lines  in  a  model,  and  another  set  of 
points  on  parallel  lines  in  the  scene,  we  can  efficiently  check  the  conjecture,  that 
some  of  these  points  correspond. 

Let  us  see  how  the  method  described  in  Section  4  can  be  reduced  for  the  case 
of  a  one  dimensional  line. 

We  have  again  two  major  steps. 

A)  Model  Preprocessing 

Extract  the  'interest  points'  on  the  model  and  find  those  of  them  which  are 
positioned  on  the  same  line.  (A  point  may  belong  to  different  lines.)  Take  a 
pair  of  points  on  a  line  and  compute  the  coordinates  of  all  other  model  points 
on  this  line  taking  this  pair  as  the  standard  one-dimensional  basis  of  the  line. 
Each  such  coordinate  is  used  as  an  entry  to  a  hash-table,  where  we  record  the 
number  of  the  basis-pair  at  which  the  coordinate  was  obtained,  the  number  of 
the  line,  and  the  number  of  the  model. 

B)  Recognition 

Extract  sets  of  points  positioned  on  the  same  line  in  the  image.  Choose  a  pair 
of  points  on  such  a  line  as  a  basis  and  compute  the  coordinates  of  the  other 
points  on  the  same  line  according  to  this  basis.  For  each  such  coordinate  check 
the  appropriate  entry  in  the  hash-table,  and  vote  for  every  triple  of  (model 
number,  line  number,  basis-pair  number),  which  appears  there.  A  triple  that 
scores  a  large  number  of  votes  gives  the  correspondence  between  the  points  on 
the  appropriate  lines.  Correspondence  of  three  non-collinear  points  (obtained 
from   different   lines,   of  course),   already  gives   a   full  affine  basis,   and  we 
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proceed  as  before. 

Since  here  we  have  to  choose  only  two  points  as  a  basis  and  not  three  the  worst 

case  complexity  is  reduced  by  a  factor  of  n. 

8.  Recognition  of  Objects  under  Similarity  Transformation 

In  numerous  vision  problems  we  are  confronted  with  the  problem  of  recogni- 
tion of  objects  which  have  undergone  a  similarity  transformation,  namely  rotation, 
translation,  and  scale.  This  is  the  situation,  when  the  viewing  angle  of  the  camera  is 
the  same  both  for  the  model  and  the  image  of  a  scene.  Such  conditions  can  be 
achieved,  for  example,  in  a  factory  environment  where  the  viewing  angle  of  a  cam- 
era on  a  conveyor  belt  can  be  kept  constant. 

Our  algorithm  is  obviously  applicable  to  the  case  of  a  similarity  transformation, 
since  it  is  a  private  case  of  an  affine  transformation.  Moreover,  since  the  similarity 
case  is  simpler  than  the  general  affine  case,  the  complexity  of  both  the  preprocess- 
ing and  recognition  stage  can  be  reduced.  The  key  observation  here  is  that  since  the 
similarity  transformation  is  orthogonal,  two  points  are  enough  to  form  a  basis  which 
spans  the  2-D  plane.  (The  first  point  is  assigned  coordinates  (0,0)  and  the  second 
(1,0).  The  third  basis  point  (0,1)  is  uniquely  defined  by  these  two  points.)  Hence, 
we  may  repeat  the  procedure  described  in  Section  4  using  basic  pairs  instead  of 
basic  triplets.  This  reduces  the  complexity  of  the  preprocessing  step  by  a  factor  of 
m,  and  the  worst  case  complexity  of  the  recognition  step  by  a  factor  of  n. 

9.  Line  Matching 

In  the  previous  sections  we  delt  with  point  matching  algorithms.  However, 
extraction  of  points  might  be  quite  noisy.  A  line  is  a  more  stable  feature  than  a 
point.  Thus  in  scenes  were  lines  can  be  extracted  in  a  reliable  way,  e.g.  scenes  of 
polyhedral  objects,  we  might  be  interested  to  apply  similar  procedures  to  lines. 

All  the  point  matching  techniques  of  Section  4  apply  directly  to  lines,  since 
lines  can  be  viewed  as  points  in  the  dual  space.  Thus  three  lines  which  have  no 
parallel  pair  are  a  basis  of  the  affine  space,  each  line  has  unique  coordinates  in  this 
basis,  and  we  repeat  exactly  the  matching  procedure  of  Section  4. 

We  can  also  make  use  of  line  segments  to  reduce  the  complexity  of  the  match- 
ing algorithm  of  Section  4.  If  the  endpoints  of  line  segments  can  be  reliably 
extracted,  then  instead  of  a  triplet  of  points  or  lines  as  a  basis,  we  can  take  a  line 
segment  plus  an  additional  point.    The  reduction  of  complexity  is  significant. 
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Since  an  affine  transformation  maps  collinear  points  into  collinear  points  and 
points  of  line  intersection  into  points  of  the  same  line  intersection,  we  may  develop 
algorithms  which  combine  point  and  line  information.  For  example,  even  if  the 
algorithm  utilizes  point  triplets  as  an  affine  basis,  the  verification  can  be  done  not 
only  on  other  'interest  point'  coordinates,  but  also  on  line  equations,  etc. 

10.   Experimental  Results 

We  have  done  several  experiments  of  recognizing  models  of  two  different 
pliers  (Fig. la  and  lb)  in  several  composite  scenes  (see  for  example  Fig.  2,  4,  5). 

Fig. la  and  lb  are  the  original  images  of  two  models  (pliers),  and  Fig.lc  and  Id 
show  the  extracted  'interest  points'  of  the  models,  which  are  points  of  sharp  concav- 
ities and  convexities.  In  Fig. 3a  we  see  an  image  of  the  plier  of  Fig. la  rotated, 
translated  and  tilted  at  about  40  degrees  (observe  the  different  lengths  of  both  han- 
dles in  the  image).  The  recognition  algorithm  was  performed  to  obtain  a  number  of 
matching  basis-triplets.  The  corresponding  affine  transformations  were  calculated 
and  for  each  such  transformation  the  transformed  model  was  superimposed  on  the 
scene  of  Fig. 3a.  Fig. 3c  shows  such  a  transformation  computed  according  to  a  basis 
triplet  which  gives  a  somewhat  noisy  match.  This  solution  is  significantly  improved 
by  the  best  least-squares  match  which  is  given  in  Fig. 3d,  and  was  calculated  using  all 
the  points  which  were  recognized  as  model  points  by  the  basis  triplet  of  Fig. 3c  (see 
Section  5). 

In  Fig. 2  we  see  an  image  of  a  composite  overlapping  scene  of  both  pliers, 
which  was  also  significantly  tilted,  its  extracted  'interest  points',  and  the  recognition 
results.  Note  that  in  Fig. 2b  we  have  additional  'interest  points'  which  are  created  by 
the  superposition  of  the  two  objects.  These  points  do  not  correspond  to  the  'interest 
points'  of  the  original  models.  Also,  one  can  see  that  a  number  of  the  original 
'interest  points'  are  occluded  in  the  scene. 

To  give  an  intuitive  feeling  of  our  algorithm's  performance  we  include  some 
statistics  on  the  example  of  Fig. 2.  The  total  number  of  'interest  points'  in  the  scene 
of  Fig. 2b  is  28.  16  of  them  are  unoccluded  model  points  of  the  second  plier  out  of 
21  original  model  points  (see  Fig. Id).  To  get  the  statistics  we  run  our  recognition 
algorithm  on  all  the  possible  basis  triplets  of  Fig. 2b.  For  each  triplet  we  found  the 
set  of  best  (maximum  vote)  matching  model  triplets.  The  number  of  points  identi- 
fied by  such  a  triplet  as  model  points  are  the,  so  called,  no.  of  votes  in  the  first 
column  of  the  table  given  below.  The  second  column  gives  the  number  of  triplets, 
which  obtained  these  votes,  and  the  third  column  gives  the  number  of  triplets  which 


13- 


were  verified  as  belonging  to  the  model  (correct  triplets). 
The  results  are  summarized  in  the  following  table: 


no.  of  votes 

no.  of  basis  triplets 

correct  basis  triplets 

14  + 

0 

0 

13 

4 

4 

12 

11 

10 

11 

12 

8 

10 

29 

12 

9 

56 

22 

8 

145 

38 

7 

287 

62 

6 

805 

151 

Remarks: 

a)  Since  we  have  16  model  points  in  the  scene,  we  expect  a  maximum  of  13  votes 
for  a  correct  triplet. 

b)  Since  all  6  ordered  occurences  of  the  same  unordered  triplet  will  give  the  same 
voting  result,  we  counted  unordered  triplets  in  our  statistics.  In  the  algorithm 
we  are  dealing  with  ordered  triplets,  thus,  for  example,  we  have  4X6  =  24 
ordered  basis  triplets  with  the  maximal  number  of  votes. 

Fig. 4  and  5  give  additional  recognition  examples.  These  examples  include 
additional  occluding  objects  which  do  not  belong  to  the  model  data  base.  Also  in 
the  example  of  Fig. 5  except  the  tilt  there  was  also  a  significant  change  in  the  dis- 
tance of  the  camera  from  the  scene. 

At  this  stage  of  our  experiments  no  effort  was  made  to  optimize  various 
parameters  of  the  algorithm  such  as  proper  quantization  of  the  hash-table  to  reflect 
the  noise  model,  or  extension  of  the  basic  point  match  as  explained  at  the  end  of 
Section  4. 
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a) 


c)  d) 

Figure  1  :  The  models  of  the  two  pliers  and  their  extracted  'interest  points' 
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Figure  2  :  a)  A  composite  scene  of  the  pliers  of  Fig . 1  (observe  different 

lengths  of  handles  due  to  the  tilt).   b)  Extracted  'interest  points' 
c)  Recognition  of  the  models  in  the  scene. 
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Figure  3  :  a)  The  plier  rotated  and  tilted  in  space  (see  different  length  of  handles) 
b)  Extracted  'interest  points',  c)  Matching  based  on  one  'basis  triplet' 
correspondence.   d)  Best  least  squares  affine  correspondence. 


a) 


Figure  4  :   a)  Composite  scene  of  the  pliers  of  Fig.l  with  an  additional  occluding 

object,   b)  Extracted  'interest  points'.   c)   Recognition  of  the  models, 


-  18 


'v*r'*»r«T'?rf^>r. 


a) 


b) 


Figure  5  :  a)  A  composite  scene  of  one  of  the  pliers  of  Fig.l  with 

additional  occluding  objects  (observe  the  change  in  scale), 
b)  Extracted  'interest  points'.   c)  Recognition  of  the  model. 
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